local subspace projection
Object based Scene Representations using Fisher Scores of Local Subspace Projections
Several works have shown that deep CNN classifiers can be easily transferred across datasets, e.g. the transfer of a CNN trained to recognize objects on ImageNET to an object detector on Pascal VOC. Less clear, however, is the ability of CNNs to transfer knowledge across tasks. A common example of such transfer is the problem of scene classification that should leverage localized object detections to recognize holistic visual concepts. While this problem is currently addressed with Fisher vector representations, these are now shown ineffective for the high-dimensional and highly non-linear features extracted by modern CNNs. It is argued that this is mostly due to the reliance on a model, the Gaussian mixture of diagonal covariances, which has a very limited ability to capture the second order statistics of CNN features.
Reviews: Object based Scene Representations using Fisher Scores of Local Subspace Projections
There is no theoretical justification of why MFA outperforms FV on transfer learning form object level to holistic scene descriptor. The main argument of the paper about "... inability of the standard GMM ... to provide good approximation ..." in L73-75 needs proof or reference to appropriate literature rather than only experiment results. It needs to clarify why full covariance in MFA is the key to transfer learning problem on CNN features. I reckon it as a week argument although it was considered as second contribution of the paper because; any other dictionary learning method with full covariance should generate the same improvement as MFA according to authors' reasoning. An experienced reader is already aware of these formulations; hence it is expected to see the focus of formulation towards main claims which I could not see them there.
Object based Scene Representations using Fisher Scores of Local Subspace Projections
Dixit, Mandar D., Vasconcelos, Nuno
Several works have shown that deep CNN classifiers can be easily transferred across datasets, e.g. the transfer of a CNN trained to recognize objects on ImageNET to an object detector on Pascal VOC. Less clear, however, is the ability of CNNs to transfer knowledge across tasks. A common example of such transfer is the problem of scene classification that should leverage localized object detections to recognize holistic visual concepts. While this problem is currently addressed with Fisher vector representations, these are now shown ineffective for the high-dimensional and highly non-linear features extracted by modern CNNs. It is argued that this is mostly due to the reliance on a model, the Gaussian mixture of diagonal covariances, which has a very limited ability to capture the second order statistics of CNN features.